Skip to content

feature: Add common TranscriptionModel interface for audio transcription #1484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

mudabirhussain
Copy link
Contributor

  • Created TranscriptionModel interface that extends Model<AudioTranscriptionPrompt, AudioTranscriptionResponse>
  • Implemented call(AudioTranscriptionPrompt) method for better compatibility between OpenAI and Azure OpenAI transcription models
  • Added default convenience methods for handling Resource and AudioTranscriptionOptions to return transcription as a String

Resolution of this opened issue: #1478

- Created TranscriptionModel interface that extends Model<AudioTranscriptionPrompt, AudioTranscriptionResponse>
- Implemented `call(AudioTranscriptionPrompt)` method for better compatibility between OpenAI and Azure OpenAI transcription models
- Added default convenience methods for handling Resource and AudioTranscriptionOptions to return transcription as a String
Copy link
Member

@habuma habuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty much like what I had in mind. Well done.

@piotrooo
Copy link
Contributor

piotrooo commented Oct 7, 2024

Meanwhile, as an enrichment to transcription take a look at #1278

@@ -0,0 +1,22 @@
package org.springframework.ai.model;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this interface be in the package org.springframework.ai.model.audio.transcription?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it should.

Model interfaces should be placed in packages that reflect their functional domain:

  • For single-level domains:
    org.springframework.ai.<domain>

  • For hierarchical domains:
    org.springframework.ai.<category>.<subdomain>

Model Interface Package Location
EmbeddingModel org.springframework.ai.embedding
ModerationModel org.springframework.ai.moderation
TextToSpeechModel org.springframework.ai.audio.tts

@kpavlov
Copy link

kpavlov commented Oct 22, 2024

Tests should be added too.
How did the build pass without test coverage 🤔

@markpollack markpollack added this to the 1.0.x milestone May 6, 2025
@markpollack markpollack added the enhancement New feature or request label Jul 11, 2025
@markpollack
Copy link
Member

Is there another transcription model we can use to verify the abstraction?

@markpollack
Copy link
Member

I've added the additional tests to check the default methods etc.

Merged in 4cf2377

Thanks @mudabirhussain and others.

@markpollack markpollack self-assigned this Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants